Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use lazy descriptor pool allocation #2285

Closed
wants to merge 1 commit into from

Conversation

SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Mar 6, 2024

Summary:

Context

In Vulkan, memory for Descriptor Sets (which are used to bind data to shader arguments) must be pre-allocated. Previously, the convention is that a large number of descriptor sets are allocated upon creation of a Vulkan Context. While this worked well in Lite Interpreter, where only a global vulkan context is used, it will lead to overallocating descriptor sets in the Vulkan Delegate, where every ComputeGraph has its own dedicated Context.

pytorch/pytorch#121134 allows the Descriptor Set pool to be initialized in a deferred fashion. This means that a ComputeGraph can count the total number of descriptors needed across all the compute shaders that will be encoded, and then allocate a Descriptor Set Pool of the appropriate size.

Implementation Overview

  1. When constructing ComputeGraph, make sure that the descriptor pool config contains 0 for number of max sets. This will ensure that no descriptor pool will be initialized when constructing the graph's api::Context instance
  2. When building the graph, ExecuteNode and PrepackNode will call graph.update_descriptor_counts(shader) upon construction, which allows ComputeGraph to count the total number of descriptor sets needed.
  3. There is a separate descriptor count object for prepack and execute, since they correspond to different command buffers.
  4. Before encoding any command buffers, call graph.prepare() which will construct a descriptor pool config from the descriptor counts.

Notes

One interesting finding is that I had to apply a safety factor to the descriptor counts to prevent the pool from running out of memory. This was reproducible on both Linux and Android.

A more robust design, i.e. as discussed here may be to maintain separate descriptor pools for each layout type. We should revisit this refactor at a later time.

Differential Revision: D54603935

Copy link

pytorch-bot bot commented Mar 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2285

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 1c16288 with merge base 0570294 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54603935

Summary:

## Context

In Vulkan, memory for Descriptor Sets (which are used to bind data to shader arguments) must be pre-allocated. Previously, the convention is that a large number of descriptor sets are allocated upon creation of a Vulkan Context. While this worked well in Lite Interpreter, where only a global vulkan context is used, it will lead to overallocating descriptor sets in the Vulkan Delegate, where every `ComputeGraph` has its own dedicated Context.

pytorch/pytorch#121134 allows the Descriptor Set pool to be initialized in a deferred fashion. This means that a ComputeGraph can count the total number of descriptors needed across all the compute shaders that will be encoded, and then allocate a Descriptor Set Pool of the appropriate size.

## Implementation Overview

1. When constructing `ComputeGraph`, make sure that the descriptor pool config contains 0 for number of max sets. This will ensure that no descriptor pool will be initialized when constructing the graph's `api::Context` instance
2. When building the graph, `ExecuteNode` and `PrepackNode` will call `graph.update_descriptor_counts(shader)` upon construction, which allows `ComputeGraph` to count the total number of descriptor sets needed.
3. There is a separate descriptor count object for prepack and execute, since they correspond to different command buffers.
4. Before encoding any command buffers, call `graph.prepare()` which will construct a descriptor pool config from the descriptor counts.

## Notes

One interesting finding is that I had to apply a safety factor to the descriptor counts to prevent the pool from running out of memory. This was reproducible on both Linux and Android.

A more robust design, i.e. as discussed [here](https://www.reddit.com/r/vulkan/comments/17v66fi/question_about_descriptor_pool_allocations/) may be to maintain separate descriptor pools for each layout type. We should revisit this refactor at a later time.

Reviewed By: jorgep31415

Differential Revision: D54603935
SS-JIA added a commit to SS-JIA/executorch-1 that referenced this pull request Mar 7, 2024
Summary:

## Context

In Vulkan, memory for Descriptor Sets (which are used to bind data to shader arguments) must be pre-allocated. Previously, the convention is that a large number of descriptor sets are allocated upon creation of a Vulkan Context. While this worked well in Lite Interpreter, where only a global vulkan context is used, it will lead to overallocating descriptor sets in the Vulkan Delegate, where every `ComputeGraph` has its own dedicated Context.

pytorch/pytorch#121134 allows the Descriptor Set pool to be initialized in a deferred fashion. This means that a ComputeGraph can count the total number of descriptors needed across all the compute shaders that will be encoded, and then allocate a Descriptor Set Pool of the appropriate size.

## Implementation Overview

1. When constructing `ComputeGraph`, make sure that the descriptor pool config contains 0 for number of max sets. This will ensure that no descriptor pool will be initialized when constructing the graph's `api::Context` instance
2. When building the graph, `ExecuteNode` and `PrepackNode` will call `graph.update_descriptor_counts(shader)` upon construction, which allows `ComputeGraph` to count the total number of descriptor sets needed.
3. There is a separate descriptor count object for prepack and execute, since they correspond to different command buffers.
4. Before encoding any command buffers, call `graph.prepare()` which will construct a descriptor pool config from the descriptor counts.

## Notes

One interesting finding is that I had to apply a safety factor to the descriptor counts to prevent the pool from running out of memory. This was reproducible on both Linux and Android.

A more robust design, i.e. as discussed [here](https://www.reddit.com/r/vulkan/comments/17v66fi/question_about_descriptor_pool_allocations/) may be to maintain separate descriptor pools for each layout type. We should revisit this refactor at a later time.

Reviewed By: jorgep31415

Differential Revision: D54603935
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54603935

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54603935

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in aed32c4.

SS-JIA added a commit that referenced this pull request Mar 13, 2024
## Context

While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue.

A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now.

## Problem Details

#2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers.

In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of

```
descriptorPoolMaxSets: 1255
descriptorUniformBufferCount: 5013
descriptorStorageBufferCount: 4
descriptorCombinedSamplerCount: 2504
descriptorStorageImageCount: 1254
```

Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver.

## Solution

Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution.

Differential Revision: [D54853788](https://our.internmc.facebook.com/intern/diff/D54853788/)

[ghstack-poisoned]
SS-JIA added a commit that referenced this pull request Mar 13, 2024
## Context

While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue.

A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now.

## Problem Details

#2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers.

In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of

```
descriptorPoolMaxSets: 1255
descriptorUniformBufferCount: 5013
descriptorStorageBufferCount: 4
descriptorCombinedSamplerCount: 2504
descriptorStorageImageCount: 1254
```

Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver.

## Solution

Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution.

Differential Revision: [D54853788](https://our.internmc.facebook.com/intern/diff/D54853788/)

ghstack-source-id: 218502788
Pull Request resolved: #2398
SS-JIA added a commit that referenced this pull request Mar 13, 2024
…emory"

## Context

While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue.

A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now.

## Problem Details

#2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers.

In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of

```
descriptorPoolMaxSets: 1255
descriptorUniformBufferCount: 5013
descriptorStorageBufferCount: 4
descriptorCombinedSamplerCount: 2504
descriptorStorageImageCount: 1254
```

Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver.

## Solution

Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution.

Differential Revision: [D54853788](https://our.internmc.facebook.com/intern/diff/D54853788/)

[ghstack-poisoned]
SS-JIA added a commit that referenced this pull request Mar 13, 2024
Pull Request resolved: #2398

## Context

While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue.

A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now.

## Problem Details

#2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers.

In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of

```
descriptorPoolMaxSets: 1255
descriptorUniformBufferCount: 5013
descriptorStorageBufferCount: 4
descriptorCombinedSamplerCount: 2504
descriptorStorageImageCount: 1254
```

Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver.

## Solution

Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution.

Differential Revision: [D54853788](https://our.internmc.facebook.com/intern/diff/D54853788/)
ghstack-source-id: 218509680
facebook-github-bot pushed a commit that referenced this pull request Mar 13, 2024
Summary:
Pull Request resolved: #2398

## Context

While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue.

A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now.

## Problem Details

#2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers.

In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of

```
descriptorPoolMaxSets: 1255
descriptorUniformBufferCount: 5013
descriptorStorageBufferCount: 4
descriptorCombinedSamplerCount: 2504
descriptorStorageImageCount: 1254
```

Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver.

## Solution

Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution.
ghstack-source-id: 218509680

bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: jorgep31415

Differential Revision: D54853788

fbshipit-source-id: 391e3d10a678672df9af96e3b6a8484453b039f1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants